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A nonparametric adaptation theory is developed for the con¬ 
struction of confidence intervals for linear functionals. A between 
class modulus of continuity captures the expected length of adaptive 
confidence intervals. Sharp lower bounds are given for the expected 
length and an ordered modulus of continuity is used to construct 
adaptive confidence procedures which are within a constant factor 
of the lower bounds. In addition, minimax theory over nonconvex 
parameter spaces is developed. 

1. Introduction. The problem of estimating a linear functional occupies 
a central position in nonparametric function estimation. It is most complete 
in the Gaussian settings: 

(1) dY(t) = f(t) dt + nT 1 / 2 dW(t), — \<t<\, 
where W(t) is standard Brownian motion and 

(2) Y(i) = f(i) + n~ 1 / 2 Zi, ieM, 

where Zj are i.i.d. standard normal random variables and At is a finite or 
countably infinite index set. In particular, minimax estimation theory has 
been well developed in Ibragimov and Hasminskii (1984), Donoho and Liu 
(1991) and Donoho (1994). 

Confidence sets also play a fundamental role in statistical inference. In 
the context of nonparametric function estimation variable size confidence 
intervals, bands and balls have received particular attention recently. For 
any confidence set there are two main interrelated issues which need to 
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be considered together, coverage probability and the expected size of the 
confidence set. 

One common technique for constructing confidence bands and intervals is 
through the bootstrap. In this context it has been noted that intervals based 
on the bootstrap often have poor coverage probability. See, for example, Hall 
(1992) and Hardle and Marron (1991). Picard and Tribouley (2000) con¬ 
struct adaptive confidence intervals for functions at a point using a wavelet 
method which achieve optimal coverage accuracy up to a logarithmic factor 
although in this case the issue of optimal expected length is not addressed. 
On the other hand Li (1989), Beran and Diimbgen (1998) and Genovese and 
Wasserman (2002) have constructed confidence balls which guarantee cover¬ 
age probability. Closer to the present work, adaptive confidence bands have 
been constructed in the special case of shape restricted functions. In this 
context Hengartner and Stark (1995) and Diimbgen (1998) give a variable 
width confidence band which adapts to local smoothness while maintaining 
a given level of coverage probability. 

In this paper we focus on the construction of confidence intervals for 
linear functionals which adapt to the unknown function. This adaptation 
problem can be made precise by considering collections of parameter spaces 
{T 3 . j G </}, where J is some index set. For such a collection of parameter 
spaces the confidence interval should have a given coverage probability over 
the union of the parameter spaces. Subject to this constraint the goal is 
to minimize the maximum expected length simultaneously over each of the 
parameter spaces. 

For example, consider the simple and most easily explained case of two 
nested spaces, T\ C T. An adaptive confidence interval must attain optimal 
expected length performance over both T\ and T while satisfying a given 
coverage probability over T . More specifically write for the collection of 
all confidence intervals which cover the linear functional Tf with minimum 
coverage probability of at least 1 — a over the parameter space T. Denote by 
L(C7, Q) = supy g g Ef(L(CI )) the maximum expected length of a confidence 
interval Cl over Q where L(CI) is the length of the Cl. Then a benchmark 
for the evaluation of the maximum expected length over T\ for any Cl £ 2 a ^ 
is given by 

(3) L* a (F u F)= M L(CI ,^i). 

0 1 El-Lcx.,^ 

In particular, when T\ = T set L* (J r ) = L* a (E, JF), which gives the minimax 
expected length of confidence intervals of level 1 — a over T. For convex JF, 
Donoho (1994) constructed fixed length intervals centered at affine estima¬ 
tors which have length within a small constant factor of 

The major result in the present paper is the construction of confidence 
intervals which have expected length within a constant factor of L* a (Tj , J~) 
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simultaneously over a collection of convex parameter spaces J-j where T = 

U J-j. The construction of such intervals is general and is applicable to col¬ 
lections of arbitrary convex parameter spaces. It is shown in Cai and Low 
(2003) that in particular cases, such as collections of convex functions, the 
general procedure can be modified to yield simple and easily implementable 
procedures. 

The main technical tools used in the derivation of the general adaptive 
confidence intervals are geometric quantities, the ordered and between class 
moduli of continuity which are defined as follows. For a linear functional 
T and parameter spaces £F and Q there are ordered moduli of continuity 
ojfyE^TjG) associated with the Gaussian models (1) and (2) defined by 

(4) uj(£,J r ,G) = sup{Tg-Tf:\\g-f\\ 2 <£-,feJ r ,geg}, 

where || • ||2 is the £ 2 (^- 5 , |) function norm in the white noise model ( 1 ) 
and the £2 sequence norm over the index set M in the Gaussian model (2). 

As we shall give a unified treatment of both models it is convenient in the 
notation used throughout the paper not to distinguish the function norm 
and the sequence norm. It is implicit that for results concerning the white 
noise model ( 1 ) the notation || ■ I 2 always refers to the L 2 function norm 
whereas for the sequence model ( 2 ) it always refers to the £2 sequence norm. 

When Q = £F ', uj(s, £F, £F) is the modulus of continuity over £F introduced by 
Donoho and Liu (1991) and will be denoted by 

For two parameter spaces T and G and a given linear functional T, the be¬ 
tween class modulus of continuity is defined as cu + (e, JG) = max{w(e, J 7 , £ 7 ), w(e, G , 
or equivalently 

(5) w+(e, W G) = sup{| Tg -Tf\:\\g- f || 2 <e-J € eG}. 

The between class and ordered moduli were first introduced in Cai and 
Low (2002) in the context of adaptive estimation under mean squared error 
where they were shown to be instrumental in characterizing the possible 
degree of adaptability over two convex classes T and G in the same way 
that the modulus of continuity u}(e,£F) used by Donoho and Liu (1991) and 
Donoho (1994) captures the minimax difficulty of estimation over a single 
convex parameter space £F. 

The paper is organized as follows. Section 2 covers adaptation over two 
convex parameter spaces T\ and T 2 where the theory is most easily under¬ 
stood. A lower bound based on the between class modulus as defined in (5) 
is given for L* (Tx.T) where T = T\ U JF 2 . An adaptive confidence interval 
attaining this bound is also constructed by using the ordered moduli as given 
in (4). Various examples are used to illustrate the adaptation theory. 

More generally let {£Fj,j G J} be a collection of convex parameter spaces 
with nonempty intersections and let T = U J-j . The goal is then to simul¬ 
taneously minimize L(CI,J-j) for confidence intervals Cl For each 
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parameter space Fj, L* a (Fj,F) provides a lower bound on the maximum 
expected length over Fj for any Cl E I a ,T- In Section 3 a complete treat¬ 
ment is given for nested Fj, possibly infinite in number. For any collection 
of nested convex parameter spaces a variable length confidence interval is 
constructed which for a given level of coverage has expected length within 
a constant factor of the minimum expected length simultaneously over all 
parameter spaces in the collection. 

Section 4 treats the case of a general finite collection of convex parameter 
spaces. A more complicated procedure results in an interval which also has 
expected length within a constant factor of the minimum expected length al¬ 
though the constant factor now depends on the number of parameter spaces 
in the collection. Finally in Section 5 it is shown, by example, that the rate 
of growth in this constant factor as a function of the number of parameter 
spaces cannot in general be avoided. In addition, the adaptation theory de¬ 
veloped in this paper is used to extend the minimax theory to a finite union 
of convex parameter spaces. This extension is given in Section 5. 

2. Adaptation over two parameter spaces. In this section we consider 
adaptation over two parameter spaces. For the development of this theory, 
it is convenient for a given a to provide a benchmark for the maximum 
expected length over F\ of confidence intervals with a given coverage prob¬ 
ability of 1 — a. over F = F\ U JF 2 , namely to provide a lower bound for 
L* a (F\. F) as defined in (3). This benchmark is given in Section 2.1 for ar¬ 
bitrary parameter spaces. 

We give a complete treatment of adaptation when the two parameter 
spaces are convex. In this case adaptive intervals attaining the lower bound 
given in Section 2.1 are constructed. The adaptive procedure is given in 
Section 2.2. Examples illustrating the theory are given in Section 2.3. 

It is convenient to write ai x bi whenever 0 < liminf ai/bi < limsupa;/6; < 
oo, where l ranges over either a continuous or discrete index set. 

2.1. Lower bound on the length of confidence intervals. The following 
simple two-point Normal mean problem is the basis for a surprisingly use¬ 
ful general lower bound on the expected length of 1 — a level confidence 
intervals. We shall see later that the two-point bound is easy to apply for 
adaptation theory because each point can be chosen to lie in different pa¬ 
rameter spaces. Previous work on confidence intervals for bounded Normal 
means as in Pratt (1961), Zeytinoglu and Mintz (1984) and Stark (1992) 
is useful for minimax theory but it is not applicable for general adaptation 
problems. 

Let X ~ N(6, a 2 ) and suppose that 8 E 0 = {#o, $ 1 } where 8q < 0\. Con¬ 
sider the following simple statistical decision theory problem: construct con¬ 
fidence intervals CI(X) for 9 which have smallest expected length under 6q 
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subject to the coverage constraint 

Pg(9e CI(X))>l-a for 0 6 0. 

Throughout the paper set z a = <E> _1 (1 — a) where <1? is the cumulative density 
function of a standard Normal distribution. In addition write L(CI ) for the 
length of a confidence interval Cl. 

Proposition 1. Let X ~ N(9,a 2 ) and suppose that 0 e 0 = {6*o,6*i} 
where 6*o <9\. Let CI(X ) be a 1 — a level confidence interval for 9. Then 

( 6 ) Eg i L(CI(X)) > (9 1 -0 o )(i-a-* - *a)) 

for * = 0,1. Moreover there exists a confidence interval which attains the 
lower bounds simultaneously for both i = 0 and * = 1 . 


Proof. It is clear that it suffices to consider confidence intervals CI(X ) 
of three possible forms: [6*o,6*i], {#o} and {#i}. The problem is then to min¬ 
imize Pg 0 (CI(X) = [6*o, #i]) subject to the constraints Pg 0 (CI(X) = {#i}) < 
a and P 9l (CI{X) = { 6 * 0 }) < a. 

It follows from the Neyman-Pearson lemma that, subject to the constraint 
that P Bl (CI(X)={d 0 })<a, 

P eo (CI(X) = {9 0 })<^(^^-z a y 

Hence 


Eg 0 L(CI(X)) = (9 1 - 9 0 )Pe o (CI(X) = [0 o ,0i]) 

= (9 1 - 0 O )(1 - Pe 0 (CI(X) = {ft}) - Pg 0 (CI(X) = {6? 0 })) 

> (0l - 0 O )(l - a - - Sa)). 

The bound for 9\ follows similarly. 

It is easy to see that an interval attaining the lower bound for 6 *o and 9 1 
is given by 


CI(X) 


{ 6 l o}, if X<9 x - z a a, 

[ 6 ?o, 6 *i], if 6 *i - z a a < X < 6* 0 + z a a, 
{ 6 *i}, if X >9 0 + z Q a, 


when 9\ 


z a a < 6 *o + z a a. Otherwise set 


CI(X) 


{9 0 }, if X< 

{9i}, if X> 


6 *o + 6 *i 
2 

6 *o + 6 *i 


2 
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In this case the confidence interval always has zero length and coverage of 
at least 1 — a. □ 


Based on the two-point bound given in Proposition 1 the following theo¬ 
rem gives a lower bound for infinite-dimensional Gaussian models. 

Theorem 1. Let 0 < a < | and let T\ C T be two parameter spaces. 
Then 

(7) 

where L* (Pi,P) is defined in (3) and w+(£,Pi, P) is the between class mod¬ 
ulus as given in (5). 


Proof. We shall focus on the proof for the white noise with drift model 
(1). The proof for the sequence model (2) is analogous. Fix e > 0. For any 
5 > 0 there are functions fi £ and / 2 £ T such that 


\Tf 2 -Th\>uA^=, 

\V n 


Pi,P)-S 


||/ 2 -/l|| 2 <^=. 

vn 


and such that 


Denote by Pi the probability measure associated with the white noise process 

dY(t) = fi(t ) dt -f —-= dW(t), —b < t < i,i = 1,2. 


n 


Let fi n = n||/i — / 2 |||- Then a sufficient statistic for the family of measures 
{Pi :i = 1,2} is given by the log-likelihood ratio S n = log(dP 2 /dPi) with 


AM y,/3r 


N 


fin 


,fir 


under Pi, 
under P 2 . 


An equivalent sufficient statistic is thus given by 

^ _Tf 1 +Tf 2 , Tf 2 -Th c 

^ cn — 0 I n ' ^n 


fin 


where 


Qr 


N(Tf1 , 

n(t/2, 


(T/ 2 — T fifi 

fin 

(Th-Tftf 

fin 


under Pi, 
under P 2 . 
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It follows from Proposition 1 that for any confidence interval CI(Q n ) based 
on Q n , 

E fl L(CI(Q n )) > | Tf 2 - Tfi\(l - a - d> ( |T/a ~ T/l 1 

where a = Afe-T/il _ Hence 

VPn 

E fl L(CI(Qn )) > | Th ~ T/i|(l - a - $(v^ll/ 2 - h\\ 2 - z a )) + 

Letting <5 —> 0, it follows that for any e > 0, 

L(CI(Q n ),E i) > o; + (1 -«-$(£- z a )) + . 

By the sufficiency of Q n , it follows that for any confidence interval Cl G T a ^ 

( 8 ) L(CI,E i) > supw + f-^=, - a - $(e - z a )) + . 

e>o W n ) 

The theorem follows on taking e = z a . □ 

Remark 1. Although the primary use of this theorem is for adaptive 
confidence intervals, it can also be used to show that from a minimax point of 
view there is relatively little to gain by using variable length intervals. In the 
minimax setting Donoho (1994) showed that over a given convex parameter 
space J-, fixed length confidence intervals for a linear functional Tf with 
coverage of at least 1 — a must have maximum length at least and 

that fixed length confidence intervals can be centered on affine estimators 
with maximum length at most 2cu( °^ 2 ,J~). By taking T\ = E, Theorem 1 
yields that the minimax expected length of a 1 — a level confidence interval 
over any parameter space T satisfies 

(9) L^)>(1-«P0=,4 

This shows that for any given a < 1/2 the optimal variable length confidence 
intervals must have maximum expected length at least a fixed constant fac¬ 
tor of the length of the shortest fixed length confidence interval when the 
parameter space T is convex. 






T. T. CAI AND M. G. LOW 


2.2. Adaptive confidence interval. There are at least two natural ways 
to define adaptive confidence intervals over a collection of convex parameter 
spaces {T t .i = 1,... ,k}. Let T = Uf=i F%- Call a confidence interval Cl G 
I a ^ adaptive over the collection {T . i = 1 ,,k} if, for all 1 < i < k, 

( 10 ) 

where Cfia) are constants depending on a only. In other words a confidence 
interval which adapts over the parameter spaces Ti attains the lower bound 
given in Theorem 1 for each i while maintaining coverage over T. We shall 
show that such adaptive confidence intervals can always be constructed when 
k is finite. 

It is also reasonable, in light of the minimax discussion given above, to 
term a confidence interval Cl G T a p adaptive over the collection of param¬ 
eter spaces Ti if, for all 1 < i < k, 

( 11 ) L(CI,T) < Ci(a)u(^,^ 

where Cfia) are constants depending on a only. We shall call such a confi¬ 
dence interval strongly adaptive. It is clear that a confidence interval which is 
strongly adaptive is also adaptive. However strongly adaptive confidence in¬ 
tervals do not always exist. Low (1997) has given examples where L* a {T\,T) 
L* a (T i), in which case strongly adaptive estimators do not exist. Other ex¬ 
amples are given in Section 2.3 and throughout the paper. On the other 
hand, when L* a (T\,T) x L* a (T\) strongly adaptive estimators do exist and 
any estimator which is adaptive is also strongly adaptive. 

In this section the focus is on adaptation over two parameter spaces where 
the theory is most easily understood. For two parameter spaces T\ C T . 
Theorem 1 gives a lower bound for the maximum expected length over T\ of 
confidence intervals with guaranteed coverage over T. We now show that the 
lower bound can in fact be attained within a constant factor not depending 
on n when T\ is convex and T is the union of T\ and another convex set 

Ti- 

Let {T \, Tfi\ be a pair of convex parameter spaces with nonempty inter¬ 
section and let T = T\ U Ti- Our first objective is to construct a confidence 
interval for a linear functional Tf which has guaranteed coverage probability 
oil —a over T and has maximum expected length over T\ within a constant 
factor of the lower bound given in Theorem 1, namely, for any Cl GI Q /, 
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The construction of the adaptive confidence interval relies on the ordered 
modulus u as given in (4). For 1 < i,j < 2, set 

_ (z a / 2 

^ i /— 5 i 5 j 

\ \fn 

Cai and Low (2004) give an algorithm for the construction of a linear esti¬ 
mator T t j which has variance bounded by 

(13) Var(T)j) < 

Z a/2 

and bias which satisfies 

(14) inU E(f itJ )-Tf)>-^i,j 
and 

(15) sup {E(T iyj ) — Tf) < \uij. 

feFi 

We shall use the linear estimators T) j to construct a confidence interval 
which has guaranteed coverage probability over T and which also has ex¬ 
pected length over T\ within a constant factor of the lower bound given by 
(26). For j = 1 and 2 define the confidence intervals CI* a by 

min {Tij - ,}, max{T,-j + § Uji} . 

The following result shows that the confidence interval Cl* a attains the 
lower bound on the maximum expected length over T\ given in (7) within a 
constant factor not depending on n and satisfies the constraint that it has 
the minimum coverage of 1 — a for all / G T. 

Lemma 1. Let T\ and JF 2 be convex parameter spaces with T\ 0 

and let JF = JT, u JF 2 . Let the interval CI* a be defined as in (16) for j = 1 and 
2. Then CI* a £ T q ,t and CI* a has expected length over Tj which satisfies 

( it ) 

Lemma 1 follows from the proof of Proposition 4 given in Section 4.1. 

Remark 2. Theorem 1 and Lemma 1 together show that under the 
conditions of Lemma 1, 


( 16 ) 


Clla = 
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Although the interval CI\ a has guaranteed coverage probability over T 
and optimal expected length over it may not have optimal expected 
length over T because the expected length over J ~2 is not controlled. On the 
other hand, by symmetry Cl\ a has guaranteed coverage probability over T 
and optimal expected length over T-i- By Bonferroni, the confidence interval 
CI* a = C7* a j 2 O Cl 2 a / 2 also has coverage probability of at least 1 — a and 
so CI* a £ Furthermore, it is easy to see that it has optimal expected 
length over both T\ and J -2 and hence also over T. In other words the 
confidence interval C7* is a 1 — a level adaptive confidence interval over T\ 
and J ~2 ■ 


Proposition 2. Let T\ and T 2 be convex parameter spaces with T\ n 
P 2/0 and let T = T\ U J~ 2 - Let the interval CIj a be defined as in (16) and 
let CI* a = Cl I a / 2 FI Cl 2 a /2 ■ Then CI* a is a 1 — a level adaptive confidence 
interval over T\ and J~ 2 - That is, CI* a £ I a ,T and for both j = 1 and 2, 

(19) L* a (Fj,r) < L{Cr a ,Tj) < C(a)L* a (^,r) 

where C(a) is a constant depending only on a. Consequently L(CI^,iFj) x 


Remark 3. It is shown in Cai and Low (2004) that the ordered modulus 
is concave. It follows that, if b > 1, then for all e > 0, 


u + (b£,F,G) = max(u;(&£, T, G),u>(be, G, d 7 )) 
(20) < max(6w(e,jF, G),buj(e, G,F)) 

< buj + (e,T,G)- 


It then follows from the bounds given in (7) and (17) and inequality (20) 
that the constant C(a) in (19) can be taken as 


C(a) 


9 + 4:Z a / 4 
(1/2 - a)z a 


2.3. Discussion. In nonparametric function estimation the goal of adap¬ 
tive estimation is often framed in terms of achieving optimality results si¬ 
multaneously over a collection of parameter spaces {iFj}. The benchmark for 
success is given by how well one could do if the parameter space is completely 
specified. We termed any such confidence interval strongly adaptive. 

So far, attention has focused on constructing adaptive confidence proce¬ 
dures which attain the lower bound on expected length given in Theorem 
1. This bound gives the best one can do in this adaptive confidence interval 
problem. The lower bound however may differ quite dramatically from the 
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minimax expected length if the parameter space J-j is prespecified. In par¬ 
ticular suppose, as is common, that the between class modulus of continuity 
is Holderian. That is, the modulus satisfies 

u+faF^Fj) = C id e«’*( 1 + o(l)), 1 < i,j < 2, 

for some constants > 0 and 0 < qij < 1. Such is the case in the examples 
given in Section 3.2 and also in many other commonly treated problems. 
When the modulus u;+(e, F, Q) is Holderian write q(F\Q) for the exponent 
of the modulus. That is, 

u + {e,F,G)^e q ^’ g \ 

Also set q[Q) = q(G,Q). 

Without loss of generality, assume q(F\) > q(F. 2 ). Throughout the remain¬ 
der of the paper C is used to denote a generic constant which may vary from 
place to place and set F = F\\JF 2 - Note that q{F\, F) = m\n{q(F\),q(F \, F 2 )} 
and q(F) = min{g(J : i), q{J r 2 ), q(F\, Fij}- In this setup strongly adaptive 
confidence intervals exist if and only if q[F\,F) = q{T 1 ) or equivalently 

q{F 1 ) < q C^i ) ) • 

There are four cases of interest. 

Case 1. q(F 2 ) < q(F\) < q(Fi,F 2 )- In this case q(Fi,F) = q[F\) and 
strongly adaptive confidence intervals exist. These intervals have maximum 
expected length which can attain the same optimal rate of convergence as 
the minimax confidence interval over known T%. Specific shape restricted 
examples are given in Section 3.2 which illustrate this case and more general 
theory. 

Case 2. q(F 1 ,^ 2 ) = q[F\ 2 ) < q(F 1 ). In this case q{Fi,F) < q{F 1 ) and 
thus strongly adaptive confidence intervals do not exist. Adaptive confidence 
intervals of level l — a over F\ and F 2 have maximum expected length over 
F\ which satisfies 

(21) L{CI,F{) > Q - -n-^)/\ 

In contrast, if it is known that f € F 1 , 1 — a level confidence intervals can 
be constructed which satisfy 

L(CI,F 1 ) < Cn~ q ^ l)/2 < Cn~ q ^ )/2 . 

Hence from this point of view the cost of adaptation is substantial. The rate 
of convergence of the maximum expected length of Cl over F\ is the same 
as that for the maximum expected length over F. 
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Example 1. Consider estimating the linear functional Tf = /( 0) over 
Lipschitz classes based on the Gaussian observations given in (1). For 0 < 
f3 < 1 and < a < b < the Lipschitz function class over the interval [a, 6 ] 
is defined as 

F(P,M,[a,b}) 

= {/ : [- b \] -*■ K. I fix) - f(y) \ < M\x - yf for x, y 6 [a, 6 ]}. 

It is also convenient to write F(/3,M ) for F(/3,M, [— |j). 


Let 0 < p 2 < Pi < 1, set Ti = F(Pi,M) for i = 1,2. In this case standard 
calculations as, for example, outlined in Cai and Low (2002) show that 
uj(e,Fi) = Ce 2l3l ^ 2l3l+1 \l + o(l)) and u(s,Fi,F 2 ) = Ce 2/32 ^ 2/32+1 ^(l + o(l)). 
Hence 


q(Fi,F) = q(F i,F 2 ) 


2ft 

2ft+ 1 


< 9(^1) 


2ft 

2/3i +1' 


Case 3. ftft) < 9 (^ 1 , F 2 ) < q{F 1 ). In this case q{Fi,F) < q(F 1 ) and 
strongly adaptive confidence intervals do not exist. Any 1 — a level adaptive 
confidence interval Cl over T\ and T 2) must have maximum expected length 
of Cl over F\ satisfying 

(23) L{CI,F i)> (^- aju+(^,Fi,Fj x n -^i^)/ 2 > n -«(*i)/2. 

The cost of adaptation in this case is that the rate of convergence of the 
maximum expected length of Cl over T\ is slower than that if it is known 
that / G J~\ but faster than for the maximum expected length over J- 2 . An 
example for this case can be given as follows. 


Example 2. Suppose that the white noise with drift process (1) is ob¬ 
served and that the linear functional Tf = /(0). Let the Lipschitz class 
F(/3,M,[a,b]) be defined as above and let D be the set of all decreasing 
functions on [— 5 ,^]- Set 

F D (p 1 ,M 1 ,(3 2 ,M 2 ) = F(Pi,Mi, [-i,0])nF(ft,M 2 , [0, \])CV. 

Let T\ = F d ( 71 , Mi, 72 , M 2 ) and ft = ftftft, A r i,ft,A r 2 ) with 1 > 71 > 72 > 
ft > ft > 0. Then as in Cai and Low (2002) it is easy to check that 

tuft, ft) = Ce 27l/(271+1 )(l + o(l)), 

tuft, ft) = Ce 2/3l/(2ft+1) (l + o(l)), 

tuft,ft,ft) = Ce 272/(272+1) (l + o(l)), 

tuft,ft,ft) = C'e 27l/(27l+1) (l + o(l)). 
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Note that in this case co(s, J -\, T2) 7^ uj(e,T2, ^i)(l + o(l)). Since 71 > 72, it 
then follows from ( 24 ) that 

Hence 0 < q(T 2) < q(Ti,T) < q(T 1) < 1 . 

Case 4 . q(T 1,^2) < < Q , (^ r i)- hi this case, strongly adaptive confi¬ 

dence intervals do not exist and the cost of adaptation is extraordinary. If 
/ is known to be in .T), one can attain the rate of convergence n q ^ i ^ 2 for 
the maximum expected length of the optimal 1 — a level confidence interval 
over JFj. Without the information 1 — a level adaptive confidence intervals 
over T\ and T2 must have maximum expected length over J~ % at least of 
order An example is given below. 


Example 3. Once again consider the white noise model with Tf = /( 0 ). 

Let 

m ,M 1 ,/3 2 ,M 2 ) = F(J3 1 ,M 1 ,[-±,0])nF(J3 2 ,M2, [0, ±]) 

and consider 0 < 72 < 71 < 1 and 0 < / 3 i < /?2 < 1 - Set J~\ = F (71, Mi, 72, M2) 

andJ^b = F(/ 3 i, N\, $2, N2). Standard calculations show that uj(e,Ti) = C'e 271 ^ 27l+1 )(l + 

o(l)) and u(e,T2) = C£ 2 ^ 2 ^ 2 ^ 2+l \ 1 + o(l)). The between class modulus is 

given as 

( 25 ) w(e,^i, ^2) = Ce 2p/{ - 2p+1 X 1 + o(l)) 

where p = max(min(7i, Pi), min(72, fh))- 


When 71 > /?2 > /?i > 72, the quantity p in ( 25 ) equals ( 3 \ and hence 

Therefore in this case q{T i,/b) < min(g(.Fi), (/(.T-b)). 


3. Adaptation over nested parameter spaces. Section 2 gave the adap¬ 
tation theory for two convex parameter spaces. This theory can be extended 
to more general collections of parameter spaces. In this section the focus is 
on adaptation over a collection of a finite or countably infinite number of 
nested convex parameter spaces, T\ C T2 C • • • C Tk, where in the case of 
k = 00, Too denotes (J^i-Ti- The objective is, for a given linear functional 
Tf, to construct variable length confidence intervals which have coverage 
probability of at least 1 — a over Tk and which simultaneously minimize the 
expected length over each of the parameter spaces J~ y A target for these 
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expected lengths has been provided by the lower bound given in Theorem 
1, namely 

(26) L* a (F 3 ,F k )> 

where ui+(s, J : j,J : k) is the between class modulus as given in (5). 

The major result of this section is to show that adaptive confidence in¬ 
tervals exist and to construct such adaptive intervals. As in Section 2.2 the 
construction of these adaptive confidence procedures relies on the ordered 
modulus u}(£,Ti,Tj) as given in (4). For 1 < i,j < k set Uij = w ( - ^p 

and let T %3 be linear estimators with variances and biases bounded as in 
(13)— (15) .' 

The confidence procedure is built in two steps. In the first step for each 
1 < _) < A: an interval is constructed which controls the coverage probability 
over T k and which also has expected length over T 3 within a constant fac¬ 
tor of the lower bound given by (26). In the second step these intervals are 
combined to create a single interval which maintains coverage while simulta¬ 
neously attaining an expected length over every T 3 within a fixed constant 
factor of the lower bound given in (26). 

For the first step define the confidence intervals Cl* as follows. For 1 < 
j <k set (^j=- , T 3 . ) and define Cl* by 


(27) 


Cl* 


Tj,k + 

2 

Tj,k + Tkj 
2 


- {(Tj,k ~ Tk,j)+ + 2 


+ {(!)', k r i'k,j)+ + 2£j} 


Lemma 2 shows that these intervals have guaranteed coverage over J-j~ and 
near optimal expected length over J~ 3 . 


Remark 4. This interval is designed for 0 < a < 0.2. If 0.2 < a < 0.5 all 
subsequent results hold with minor modifications, as noted in later remarks, 
when the interval is replaced by 


(28) 


CI* = 




Tj,k + Tk,j 


{{Tj,k r ^'k,j)+ + 3£j}, 
+ {{Tj,k ~ Tfc,j)+ + 


2 
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Lemma 2. For 0 < a <0.2, the confidence interval Cl* defined in (27) 
has coverage probability of at least 1 — fa for all f £ Ft and satisfies 

r«/ 2 ) + 4 } -& 

( z a/2 \ 

Remark 5. For 0.2 < a < 0.5 the interval given in (28) satisfies the 
same coverage but has expected length bounded by 10 io + (-^=-,Fj,F k ). 

In the following proof, and throughout the rest of the paper, write Z for 
a standard Normal random variable. 


Proof of Lemma 2. Lemma 2 gives a bound on both coverage proba¬ 
bility and expected length. First consider coverage probability. It is easy to 
see that the interval Cl* contains the interval Clj defined as 

(30) CI j = [T k>j - T j>k + 2£j\ 

where the interval Clj is taken to be the empty set whenever the left end¬ 
point of the above interval is larger than the right endpoint. First note that 
for / £ F k , ET kt j -Tf < \u k j and that ET jjk -Tf> -\uj, k - Let 

_ Tj^ -Tf- (l/2)u kij 

z k,j / i 

Uk,j/Z a /2 

Tjk — T f + (l/2)a )j k 

z j,k = — ---7-— • 

w j,k/ z a/2 

Then for any / £ F k it follows from (14) and (15) that z k j has a Normal 
distribution with mean less than or equal to zero and variance bounded by 
1, and Zj.fc has a Normal distribution with mean greater than or equal to 
zero and variance bounded by 1. Note that fj = rna x(uj k j,uj k ). Hence for 
f€F k , 


P(Tf i Cl*) < P(Tf i Clj) 
<p(z kii > (2 


£3 


^k, j 


<2 P(Z> —Z a /2 



+ P 


z j,k — 




z a/2 


Note that for a fixed A > 1, it is easy to verify that g(z) = P(Z > Xz)/P(Z > 
z) is a strictly decreasing function of z for z > 0 and for a = 0.2, 

2P(Z > § z a/2 ) < fa. 








16 


T. T. CAI AND M. G. LOW 


Hence, P(Tf (f C7*) < and so the claim of the required coverage proba¬ 
bility has been established. 

Now turn to the bound on expected length given in (29) for which the 
following technical lemma is needed. 


Lemma 3. Let X ~ N(p,a 2 ) with /r < no and 0 < a < a^. Then 
( 31 ) E*l(*>0)<^) + jLexp(-2|). 

Proof. It is easy to check by taking partial derivatives that EX1(X > 
0) is an increasing function of both p, and a. Hence 

E^X 1(X > 0) 

<E flo ,a o Xt(X>0) 


\J 27TCT0 


\/2Trao 


xexp 


Mo exp 


(x- ppf 
2cj2 

{x — Mo)" 




i , a ° 

I H— 7 ^= exp 


V <^0 / V2tt 
Now note that for / G J-j , 


2 £J o 

Mo 

2 °'o 


dx 


dx + 


co 


V 


V2nJ-, 0 /<r 0 yeXP { 2 


dy 


□ 


E{Tj :k - f k .j) < and Var (Tj >k - T k ,j) < — f 2 


a/2 


and so from Lemma 3 it follows that 
(32) E(Tj )k - Tk,j)+ < ~ l ~ 

and hence (29) is satisfied. □ 


'/Tkz, 


u/2 


ex P -A /2 f 


Lemma 2 shows that the interval Cl* has guaranteed coverage over E k 
and near optimal expected length over Tj. Before turning to the construction 
of an adaptive confidence interval we state a simple preliminary lemma. The 
proof is straightforward and not given here. 


Lemma 4. Let 0 < Ci < C 2 ■ • • < Cfc he a sequence of monotonically 
increasing positive numbers. Then there exists a unique subsequence < 
£?2 < • • • < Cj m with j m = k, such that for all 1 < i < m, 

(33) Cm > 2£ Ji _i and Cm < 2C, for all j t _ 1 <j< j* 

where we set jo = 0 and Co = 0. 











NONPARAMETRIC CONFIDENCE INTERVALS 


17 


The construction of the adaptive confidence interval proceeds as follows. 
Once again for l<j<k, set £,■ = Tj, E k ). Let < £j 2 < ■ ■ ■ < £ j m 

be the subsequence satisfying (33). Let j be the index of the shortest interval 
among all the Clj. for 1 <i<m. More precisely, 

j = argmin L{CI*.). 

Then the adaptive confidence interval for Tf is defined by 

(34) CI* = CI*.. 

The following theorem shows that Cl* is a 1 — a level adaptive confidence 
interval over the collection {Tj,j = 1 ,,k}. 

Theorem 2. The confidence interval Cl* defined in (34) has coverage 
probability of at least 1 — a for all f £ P k , that is, Cl* £ P a ,T k and satisfies 

(35) L* a ^,T k ) < LtyCI*,Tj) < 
simultaneously for all 1 <j<k. Moreover, 

(36) L ( Cl* , ) < 8o/+ 
for all 1 < i < m, and for all 1 <j <k 

(37) L(Cr,^ j )<16u + (^,^ k y 


The proof of Theorem 2 rests on the following important technical lemma. 
Recall that Lemma 2 gives a lower bound on coverage over T k and an upper 
bound on expected length over P 3 . Lemma 5 shows, in a precise way, that 
if Cl* has a large expected length it must have high coverage probability. 

Lemma 5. If f £ J- k and 

P(Tf i Cl*) > 2 P(Z > i (A + 3 )z a/2 ), 

then 

E(Tj, k - Tkj) < A £j. 


Proof. First note that 


P(Tf Cl*) < P[T f < 


Tj,k + Tkj 


~ {T jtk - T fcj - + 2 £j) 


- T " k Tk ' J + (T jik - T kJ + 2£j ) ) . 


2 
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Now note that 

-k' - ~E(f ])k - f kJ ) < l-;ld±_l±L - T/ < h. + l E {f hk - T kJ ). 


2 J 2 J ’ ~ 2 2 J 2 
1 — Tf — (Tj y k ~ Tk.-j + 2 gj). Suppose that 
E{Tj,k ~ T k .j) > A £j. 


Let X = Tj ’ fc + Tfcj 


Then 


£(X) < - A (A + 3)^- and Var(X) < -g— 

Z a/2 


Hence 


P(Tf< _ (f hk - f k>j + 2 Zj) )=P(X> 0) 


< P^Z > -(A + 3)z a /2^j ■ 


Similarly, 


P(Tf> T >’ k + Tk ' j + {P 3 .k ~ f kJ + 2£j)) <p(z>-{ A + 3 )z a/2 ). 


Hence, 


P(Tf £ Cl*) < 2P(Z > i(A + 3 )z a/2 ). 


□ 


Proof of Theorem 2. Note that it suffices to prove (36) since (37) 
follows immediately from (36) and (35) is a direct consequence of (20), (7) 
and (37). For (36) assume without loss of generality that £j > 2£j_i for all 
1 < j < k: otherwise we can work along the subsequence. First note that 
since Cl* is the shortest of all the Cl* confidence intervals Lemma 2 yields 
that the expected length of Cl* satisfies 

L{CT*.Tj) < L(CI*,J r j ) 

08 ) < {2*(lz a/2 ) + 7 J—exv(-^ /2 ) + 4 } ■ {,- 

< 8 ^. 

Now turn to the proof of coverage. Note that 


p(Tf £ CI*) = Y J P(Tf £ CI* n j = j) 

3 = 1 
k 

<Emin {P(Tf£CI*),P(j = j)}. 


( 39 ) 
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For l > 0, denote d(l) = 2P{Z > \{l + C]z a / 2 ). Note that d(0) = 2P(Z > 
^z a / 2 ) < fa- For Z > 1 let 

(40) A t = {j : d(Z) < P(Tf i Cl*) < d(l - 1)} 

and let j(l) = rnin{j : j G Ai). Note that it follows from Lemma 2 that (J; Ai = 
{j > 1}. Then by Lemma 5 

(41) E (Tj(i),k ~ Tk,j(l )) < G + 3)Cj(0- 

Note that Yax(T j{ i ))k - T fcJ(;) ) < so 

a/2 

P(L(Cr m ) > (j)) = P(fj(i), k - > 2(p - l)^(j)) 

<P(Z>(p-l-±l)z a/2 ). 

Since > 2^_i, it follows that, for any integer m > 0, 

P(j > m + m)< P(L(CI* m ) > 4$ m+m ) 

<P(L(Cr j{l) )> 4-2% ({) ) 
<P(Z>{2 m -\-\l)z a/2 ) 

— “TZ,m* 

Let j* = min{j(Z): 1 < l < 8}. For m = 3 and 1 < l < 8, 7 ; !m = P(Z > 4(11 — 
O^a/a)- If J* = th en 

k 

E min{P(T/ i CI*),P(j = j )} < d(0) + d(0) + d(0) + 7 i,3 

3 = 3 * 

< ^a + P(Z > 5z a / 2 ). 

Similarly, if j* = j{l) for some 2 < l < 8, then 
k 

E min{P(T/ / CI*),P(j = j)} < d(l - 1) + d(0) + d(0) + 7 C 3 

3 = 3 * 

< + P(Z > 5z a / 2 ). 

Hence 

k 

(42) E min{P(T/ £ CI*j),P(j = j)} < %a + P(Z > hz a/2 ). 

3 = 3 * 

The following simple lemma can be used to bound P(Z > 5z a / 2 ). 
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Lemma 6. Let Z ~ iV(0,1) and let a > 0 and b > 0 be two constants. 
Then 


P(Z >a + b)< exp (—(ab + ^b 2 ))P(Z > a). 


Applying Lemma 6 with a = z a /2 and b = 4 z a / 2 , it follows that 
P(Z > 5z a / 2 ) = P(Z > Z a /2 + 4^ a /2) 


2 N a , 1 


<exp(-12^ a/2 ) •-<—«. 


Therefore 


P(Tf i CP)<Y^ min {P(Tf $ CI*),P(j = ;})} 


3 =1 


< Ii« + E E mm {P(Tf?CI*),P(j=j)}. 

l=9j£A t 


For l > 9, let mi be the smallest integer satisfying 2 mi > 4 (3 1 + 7). Then mi < 
log 2 (3l + 7) — 1. Recall that for j 6 P(Tf ^ Clj) < 2P(Z >j(l + 3)z a / 2 )- 
Now note that 


P(j > j(l) + m{) <^ mi < P(Z > \(l + 3)z a/2 ). 

So, for l > 9, 

Emin {P(Tf^CI*),P(j=j)} 

3&+ 

< mi ■ 2P(Z >\(l + 3 )z a/2 ) + + 1,1711 

< (2 mi + l)P(Z > \(l + 3 )z a / 2 ). 

So 

OO 

EE “in {P(Tf^CI*),P(j=j)} 

1=9 j^Ai 

00 

< ^(21og 2 (3/ + 7) - 1 )P(Z >\(l + 3 )z a/2 ). 

1=9 

Lemma 6 yields 

p(^ Z > -(l + 3)z a / 2 ' S J < P^Z > Z a /2 + ~(l ~ l)^ a /2^ 

£ ex p(-(l( i - 1 ) + 4 (, “ 1)2 )^/ 2 )-f- 
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Hence, 

OO 

EE mm {P(Tf$Cr j ), p (j=j)} 

i=9jeA t 

< f E( 21 °g2 (3/+ 7)- 1)exp+ ^( Z_ 1) 2 ) 4/2) 

< |Z^( 21 °g2(3/ + 10) - 1) exp ^—^ l ) exp^— 

It is easy to see that for l > 8, (21og 2 (3Z + 10) — 1) exp(— (z^j 2 /S2)l 2 ) is 
strictly decreasing and 


So, 

00 -1 00 / ^2 \ -1 

E E rnin { p ( T / i CI *j)i p G= 3 )} < 7 Q E ex p(— ~A~ l ) - Td a ' 

i=9jeAi 4 1=8 v 4 

Hence, 

P(Tf £ CP) <f A a + Pa = a. □ 

3.1. Adaptation over nearly nested parameter spaces. In some common 
cases of interest such as Holder spaces, Sobolev spaces and Besov spaces, the 
parameter spaces are not exactly nested, but have nested structure in terms 
of the moduli of continuity. Theorem 2 can be generalized to such nearly 
nested parameter spaces. 

Denote by C.Hull(.F) the convex hull of a parameter set T . Let i = 
1,..., k, be convex parameter spaces and for any integer 1 < m < k let Q m = 
\JiL\Pi- Suppose the following condition, which is trivially satisfied if Ti are 
nested, holds. 

Condition C. For 1 < j < k and some constants C2 > C\ > 0, 
w(e,C.Hun(0 J -),C.Hull(&)) < CW(e,0j,&) < C 2 u(e,^j,^ k ) 

and 


(21og 2 (3Z + 10) — 1) expf — 


“a/2 

~32~ 


u(e, C.Hull(^fe), C.Hull(^j)) < CW(e,&,0j) < C 2 u{e,F k ,F j ) 


for all 0 < e < • 
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Similarly to the nested case for 1 < i, j < k, set ujC = C.Hull(^), C.Hull(^)), 

and once again Cai and Low (2004) give a construction of linear estimators 
T-j which have variance bounded by 

VarCfy < L-u.'/ 

~a/2 

and bias which satisfies 

and 


sup (E(T[ j) — Tf) < 

f&Ti 


2 ^i,j * 


Set £'■ = C.Hull(^j), C.Hull(^fe)) and define the confidence inter¬ 

vals Cl* as earlier. When 0 < a < 0.2, let 


(43) 


Cl* 


r t' 4 -T' 

1 j,k + k,j 

2 

Tj,k + 

2 


-{( 2 ?^-^)++ 2 ^}, 
+ {(f' fe -fy + + 2^} 


and when 0.2 < a < 0.5 let 


(44) 


Cl* 


T' 4- T' 

1 jT + k,j 


T' -l- T' 

1 j,k + k,j 


-{{f' jik -f' Kj ) + + 3^'}, 

+ {(Tj,k ~ Tkj)+ + 3 £'} 


Following the argument given in the nested case let be a subsequence 
of £'• satisfying (33) and let j = arg rnin^.i <j< m L{ CI * i ) be the index of the 
shortest interval along the subsequence and define the adaptive confidence 
interval for Tf by 


(45) Cl* = Cl*.. 

As stated precisely in the following result this confidence interval is adaptive 
over the parameter spaces {Tj : j = 1,... ,k}. 


Proposition 3. Suppose Condition C holds. Then the confidence in¬ 
terval Cl* defined in (45) has coverage probability of at least 1 — a for all 
f G T = UjU Fj and satisfies the lower bound on expected length, 

(46) L* a (fFj , T) < L(CI*,J r j )< C(a)L* a (^, T ), 

simultaneously for all 1 < j < k, where the constant C(a) only depends on 
a and is independent of k. In other words, L(CI* ,J~f) xw|(^,fj,L) for 
all 1 < j < k. 
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We omit the proof of Proposition 3 since it essentially follows a similar path 
to that of Theorem 2. 


3.2. Examples. Theorem 2 and Proposition 3 have established general 
adaptation results for collections of nested or nearly nested parameter spaces. 
In this section a couple of examples are given which illustrate this general 
theory. 

Suppose that we observe the white noise with drift process (1) and that 
the linear functional is point evaluation. For convenience take Tf = /(0). 
Let V be the set of all decreasing functions on |] and let Fd((3,M) = 
F(/3,M ) nO be the collection of monotonically decreasing Lipschitz func¬ 
tions where F(/3,M) is the Lipschitz class defined in (22). 

For integer j > 1 let Mj = 2 J '( 2 ' fl+ 1 )-4= and let Q = (J^Li Fd(/3, Mj). Stan¬ 
dard calculations as in, for example, Donoho and Liu (1987), yield 


(47) 


u(e,F D ((3,M),G) =cu(e,g,F D (p,M)) 

= (2/3+ l) 1 /(2/3+l) M l/(2/3+l) £ 2/3/(2/3+l) 


for M > (2/3 + l) 1 / 2 ^. Let = uj(^/=-,Fd(( 3, Mj),g). Then it is easy to see 
that £j + i = 2and hence the adaptive confidence interval given in (34) has 
coverage probability over g of at least 1 — a and satisfies 

L(CI*,F D (/3,Mj)) < 6(2/3 + i)V(2/3+i) M V( 2 / 3 + 1 )^W(2/3+i)^_^/( 2/ 3 + i)^ 

(48) 

Furthermore, for any M > 0, 

(49) L(CI*,F d (P,M )) < 12(2/3+ 1) 1 /( 2/3+1 )M 1 /( 2/3+1 )4/2 (2 ^ +1) « -/3/(2/3+1) 

for all sufficiently large n. 

Another common problem in function estimation is to adapt over smooth¬ 
ness classes. For fixed M > 0, the classes Fd( 7 i,M) C Fjj( 72 , M) whenever 
0 < 72 < 71 < 1. Let g' = (Jo< 7 <i Fd( 7 , M). Then once again standard cal¬ 
culations yield 

w(e, F D (l 3, M),g') = u{e, g', F D {l 3, M)) 

= (2/3 + 1 )l/(2/3+l) M l/(2/3+l) £ 2/3/(2/3+l)_ 


Now let 1 = /3i > P 2 > • • • be the sequence such that 

u + (^,F D (P j+1 ,M),g^=2u; + (^,F D (P j ,M),g , y 


Then the adaptive confidence interval given in (34) has coverage probability 
over g' of at least 1 — a and satisfies 

L(CI*,F D (J3j,M)) 

< 6(2/3j + 1 )V(2ft+l) M V(2/3 J +l) z W(2^+l) ri _ / 3 J ./ (2/ 3 J .+l)_ 


( 51 ) 
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Furthermore, for any 0 < f3 < 1, 

(52) L(CI*,F d (/3,M)) < 12(2/3 + 1 ) 1 /( 2 ^+ 1 )M 1 /^ +1 )^ (2 ^ +1) n^/( 2 ^ +1) 
for all sufficiently large n. 

4 . Adaptation over a general collection of convex parameter spaces. Sec¬ 
tion 3 focused on collections of nested parameter spaces. It has been shown 
that the between class modulus of continuity completely characterizes the 
optimal expected length of adaptive confidence intervals. One particularly 
interesting feature of the nested case is that the optimal expected length of 
the confidence intervals does not depend on the number of parameter spaces 
in the collection. 

The nested case, although interesting, is somewhat special. In this section 
general finite collections of convex parameter spaces are considered. In this 
general setting the theory is more complicated and in general the number of 
parameter spaces, say k, may also play a role in the optimal expected length 
of adaptive confidence intervals. For a fixed and finite number of parameter 
spaces the optimal expected length of adaptive intervals is still within a 
constant factor of the between class modulus of continuity. However the 
constant factor in this case can depend on the number of parameter spaces. 
We construct adaptive confidence intervals which show that this constant 
factor does not grow faster than ^/log k and we give an example which shows 
that this factor is sometimes necessary. 

Let {J-j : j = be a collection of convex spaces with nonempty 

intersections, that is, T{dTj 0 for all i, j. The objective is to construct an 
adaptive confidence interval for a linear functional Tf which has guaranteed 
coverage probability of 1 — a over Q = Uj=i Fj and rate optimal expected 
length over each of the parameter spaces J-j. 

The adaptive confidence interval given in this section differs substantially 
from that given in the nested case. However, the general strategy for con¬ 
structing adaptive confidence intervals in this setup is similar to that of 
the nested case. In particular, a key step is to first construct an interval 
which has optimal expected length over one of the parameter spaces while 
attaining coverage probability over the union of the parameter spaces. 

4.1. Constrained optimal expected length confidence intervals. As men¬ 
tioned above, it is convenient to construct a confidence interval which has 
shortest possible expected length over a given J : j while maintaining coverage 
probability over Q = Uj=i Fj ■ 

First note that for any confidence interval Cl £ I a ,Gi Theorem 1 yields a 
target for the expected length 

G-“M+^4 


(53) 


L(CI,Fj)> 
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As in Section 2.2, for 1 < i.j <k set to l ,j = Ti, Tj) and let be 

a linear estimator which has variance bounded by and bias which 

ot./z 

satisfies 

(54) inf {E{T id )-Tf)>-\uHj 

}€J~j 

and 


(55) sup ( E(f iyj ) — T f) < 

feTi 


As a first step in the construction of adaptive confidence intervals, define 
C7* a by 


(56) 


CI l« = 


min {Ti tj - + § Ujj} 


The following result shows that this confidence interval attains the lower 
bound on the maximum expected length over Tj given in (53) and satisfies 
the constraint that it has the minimum coverage of 1 — a for all / G Q. 


Proposition 4. Let Tj, j = l,... ,k, be convex parameter spaces with 
Ti Cl Tj ^ 0 for all i,j and let Q = (Jj=i T 3 . Let the interval CI* a be de¬ 
fined as in (56). Then CI* a G T a ,g and CI* a has expected length over Tj 
satisfying 


(57) L* a (Tj,g)<L(CI* j)a ,Tj)< 


f 8i/log (k + 1) + 4z a /2 
l (1/2 — a)z a 



Remark 6. It follows from (59) that the expected length of the confi¬ 
dence interval CI*j a is rate optimal as n —> oo as long as k remains fixed. 


Proof of Proposition 4. First consider the coverage probability of 
the interval CI*j a . Suppose / G T m for some 1 < m < k. Note that the in¬ 
terval CI*j a contains 


CTm,j — \Pm,j ~ 


2 


,j )Tj,m 


+ 2 u j,rri\- 


The derivation below shows that the interval C7 mj - has correct coverage 
probability. First note that for / G T m , ET m j — Tf< \oj m j and that ETj )m — 
Tf> -\u hm . Let 


A rrl. j — 


Tij -Tf- (l/2)io mJ 


ui 




iI z a/2 


Xj,m — 



— T f + (l/2)a ij^ m 

^j,m/' z a/2 
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Then for any / £ T m it follows from (54) and (55) that X m j has a Normal 
distribution with mean less than or equal to zero and variance bounded by 
1 and X ]m has a Normal distribution with mean greater than or equal to 
zero and variance bounded by 1. Hence, for / £ E rn , 

P(Tf £ C7*J > P{T f £ CI mJ ) 

= P{Xm,j P Za/2 &nd Xj m < Za/ 2 ) 

P 1 ~~ P(X m .j < “" 0 / 2 ) — P{Xj^ m > Z a /2) 

>1 — a. 

So for any / £ £7, P(Tf £ CI * a ) > 1 — a and thus coverage has been estab¬ 
lished. 

The bounds on the expected length of these intervals can now be obtained 
by using the following technical lemma from Dudley [(1999), pages 56 and 
57], 

Lemma 7. Let XX 2 , ■ ■ ■, X^ be normally distributed random variables 
with mean 0 and variance < cr 2 . Then 

(58) E ma*|X,| <a(2 + i±L|l) ' Oog()=+1). 

Let 

ij = w+ ( ^, g ) = max {Wj j , Uj i}. 

\ v n / !-<«<& 

It is easy to see that the length of the interval CI* a is bounded by 
L(C7*J < max(T)- j - T/)+ + ma x(Tf - T itj ) + + 3&. 

J l l 

Now note that if / £ J-j, then for any i 7 ^ j, 
a j}i = E(T jt i — Tf)< 

and 

bij = E(Tf - f id ) < 

Also note that for any real numbers x and y, (x + y)+ < (x) + + (y)+. So 
for / £ Xj the expected length of CI* a satisfies 

EL(CI* J < E max(Tj j - Tf)+ + Emax(Tf - f u ) + + 

J ’ i i 

< E^max{(aj-i) + + (f jti -Tf - a^) + }^ 
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< 


+ E^maxj (&*,_,-)+ + (Tij -Tf - b itj ) + }^J + 3 ^ 

_B^max(Tj )) ; -Tf - a jt i) + ^j + £’^max(T i j - Tj - bij) + ^j + 4^-. 


It then follows from Lemma 7 that 


E f(L(ci*, a )) < (2 + Sr|y) Vlog(fe + 1) + ^ 


< 


, \/log(fc + 1 ) 


*a/2 


+4 e 


and it follows by taking the supremum over Tj that 


(59) 


<{s 


y/lQg(fc+ 1 ) 

z a/2 


+ 4 ^w_|_ 


^a/2 


n 




The proposition now follows by combining (20), (7) and (59). □ 


4.2. Adaptive confidence intervals. The intervals CI* a constructed in 
the last section have near optimal expected length over Tj but do not con¬ 
trol the expected length over other T\. In this section adaptive confidence 
intervals over {Tj : 1 < j < A:} are formed by intersecting such intervals. For 
a fixed k, the resulting interval has rate optimal expected length over every 
parameter space Tj for all 1 < j < k. A Bonferroni approach is applied to 
the intervals of Section 4.1 to yield an adaptive confidence interval. 

More specifically, define the confidence interval Cl* by 

(60) cr = n cr Mk 

3 = 1 

where CI* a are given in (56). The following theorem shows that this confi¬ 
dence interval has guaranteed coverage probability and also has near optimal 
expected length over Tj for each 1 < j < k. 

Theorem 3. Let Tj, j = 1,..., k, be convex parameter spaces with T n 
Tj 0 for all i,j and let Q = {j k j = \Tj. Let the interval CL* be given as in 

(60) . Then Cl* and CL* satisfies 

(61) L* a (Tj,G) < L(CI* a ,Tj) < ^-a) Za ' L* a (Tj,G) 
for all 1 < j < k. 
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Proof. The results follow easily from Proposition 4. For any / £ Q, 
Proposition 4 shows that 

P(Tf e CIl a/k ) > 1 — H 

Hence, for any / £ £/, 


P(Tf e cr) = 1 - P(T/ 0 CP) > 1-]T P{Tf i CP ja ) > 1 - 

i=i 


a. 


For the expected length note that 
L(CI*,Pj) < L^CPj^i^Tj) < {8 


\/log(fe +1) 

^ a./2k 


( Z t 


+ 4 


a/2k 


\ V n 


,Pj,G 


for any 1 < j <k. For 0 < a < 0.5, calculations show that 


v / log(fc+ 1 ) 
Za/2k 


< 1 


and hence 
(62) 


L(CP,Pj)< 12cu + (^,^,^. 


The theorem now follows by combining (7), (20) and (62). □ 


Remark 7. It follows from Lemma 6 that z a / 2 k 5s \J^^~ ^ ^ + 1 ' z a /2 ■ 
Hence it follows from (62) and (20) that 


L(CP,Pj) < 12w + | 


- 3 — log k + 1 ■ tPll. Tj g 
* 1/2 ^ 


n 


si Vi iogt+i '" + (w-^- e )' 


The ratio of the upper bound just given to the lower bound in (53) is thus 
clearly bounded by a constant multiple of ^/logk. 

Section 5.2 gives an example of a nearly black object which shows that 
this ^/log k factor cannot in general be improved. 


5. Minimax confidence interval for nonconvex parameter spaces. As men¬ 
tioned in the Introduction, Donoho (1994) constructed for any convex pa¬ 
rameter space T fixed length intervals centered at affine estimators which 
have length within a small constant factor of the minimax expected length 
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L* a (T). Although the focus of the present paper is on adaptation the adap¬ 
tation theory developed in the previous sections can also be used to yield 
a minimax theory for parameter spaces that are finite unions of convex 
parameter spaces. In this section confidence intervals with a specified cover¬ 
age probability are given which also have near optimal maximum expected 
length. It is also shown, in contrast to the theory for convex parameter 
spaces, that optimal confidence intervals centered on affine estimators can 
have expected length much longer than the expected length of optimal con¬ 
fidence intervals centered at nonlinear estimators. 

Let Z), i = 1 ,.. ., k, be convex parameter spaces with Z) n Tj / 0 for 
all i, j and let Q = Uf=i Z). Note that the parameter space Q is in general 
nonconvex. The minimax expected length of confidence intervals Cl £ 
can be bounded above and below as follows. 

Set 0 < a < | and let Cl be a 1 — a level confidence interval for all 
/ £ Q = IJ-L) Ti- It follows from Theorem 1 that the maximum expected 
length of Cl £ Z a) g is bounded below by 

(63) L(C/,S)>(1-„).(J=4 


Upper bounds on the minimax expected length can be obtained by con¬ 
sidering the confidence interval Cl* as defined in (60). As shown in Theorem 
3 this interval has coverage probability of at least 1 — a over Q. In addition, 
it follows from (61) that the maximum of the expected length of Cl* over 
Q satisfies 


(64) 


L{cr,g ) 


= max L(Cl*,J-j) 

l<j<k 


<12 max ui+ 
1 <j<k 





Hence, (63) and (64) together yield the following result on the minimax 
expected length of 1 — a level confidence intervals over Q. 


Theorem 4. Let Q = Vfj = \ where for j = 1 Tj are convex 

spaces with nonempty intersections and suppose 0 < a < \. Then 


(65) 



<L* a {Q)< 12w 


( Za./2k 



Hence, the confidence interval Cl* attains the optimal rate of convergence 
for the maximum expected length over the parameter spaces Q when the 
number of convex subspaces is fixed and finite. 
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The example of confidence intervals in Section 5.2 for a linear functional 
of nearly black objects shows that the factor of z a / 2 k ~ \/\ogk in the upper 
bound of (65) cannot be dropped in general when the number k of convex 
subspaces grows with n. 

5.1. Confidence intervals centered at affine estimators. We now consider 
the performance of confidence intervals centered at affine estimators over 
nonconvex parameter spaces. As mentioned earlier, when the parameter 
space T is assumed to be fixed and convex, Donoho (1994) and Theorem 1 
given in Section 2 together show that the length of the shortest fixed length 
confidence interval centered on an affine estimator is within a fixed constant 
factor of the maximum expected length of the optimal confidence interval. 
Hence there is relatively little to gain by looking beyond the class of fixed 
length confidence intervals centered on affine estimators. 

The following theorem considers the case when the parameter space is 
nonconvex. Once again let C.Hull^) denote the convex hull of a parameter 
space T. 

Theorem 5. Consider the white noise model (1) or the sequence model 
(2). Let T he an affine estimator of Tf and j > 0 a nonnegative random 
variable. If Cl = [T — 7 , T + 7 ] is a (variable length ) confidence interval 
centered at T and Cl E 1 a ,F, then 

(66) L(CI,F)>C(a) u(^B,C.Rulim 

V V n 

where C{a) > 0 is a constant depending on a only. In particular, if the 
interval Cl is of fixed length, then 

(67) L(CI)>-u(^ff-, C.Hullhn 

2 V y/n 


Proof. It is shown in Cai and Low (2004) that the affine estimator T 
satisfies 

sup \ET — T f\= sup \ET-Tf\. 
f&r fe c.Huii(^) 

It then follows from Theorem 2 of Low (1995) that T must satisfy either 

( 68 ) sup \ET — Tf \ > 

fer 4 V Vn ) 


or 


° f ~ 4z, 


-uj 


q/2 


2z, 


all 


n 


(69) 


C.Hull(^) 
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where Oj, denotes the standard deviation of the estimator T. We now con¬ 
sider the two cases separately. If ( 68 ) holds, then for any e > 0, there exists 
/ G T such that 

(70) B , = \Ef-Tf\> p (^£- 

Since Cl = [T — 7 , T + 7 ] has minimum coverage probability of at least l —a 
over JF, 

1 — a < Pf(\T — Tf\ < 7 ) 

= Pf(\T — T f\ <7 and 7 <P/) + P/(|T — Tf \ <7 and 7 > Bf) 

— Pf(\T ~Tf\< B f ) + P ( 7 > B f ). 

Since T is an affine estimator and thus has a normal distribution, it is easy 
to check that Pf(\T — Tf \ < Bf) < 1/2 and hence 

(71) P{ 1 >B f )>\-a. 

Letting e —> 0 in (70), it then follows that 

E f L(CI) = 2E f ('y) > 2P/P ( 7 > B f ) 

^iMw' CHull( 4 


, C.Hull(jF)^ — £. 


If (69) holds, we have, for / £ JF, 
1 — a<P/(|T — T/|< 7 ) 


_ p[ 7 ET ~ T f <z< 'y ET-Tf 


dri 


dri 


dr4 


dri 


<P( \z\ < — 

(J rfi 


7 


dri 


= PI \Z\ < — and 7 < ZQ.2b&f ) + PI \Z\ < — and 7 > Zo.25^f 


7 


dri 


< P{\Z\ < 20 . 25 ) + P< d > 20.25 Of) 

where Z denotes a standard normal random variable. Hence 

P(7 > 20.25 CTf) > | 

Consequently, 


EfL(CI) = 2E f ('y) > 2z 0 .25d f P('y > 20 . 2507 -) 

> (1 - 2a)^^o;f^^,C.Hull(P) N ) 

4 z a/2 V / 
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1 — 2a ^f 2 z a /2 

10*„/2 V V ™ ’ 


C.Hull(^) 


Equation (66) now follows by taking C(a) = min{| — f, j }• Equation 

(67) for the fixed length case is easier to prove and we omit the proof here. 

□ 


Remark 8. Theorem 5 shows that the minimax expected length of con¬ 
fidence intervals centered at affine estimators is determined by the modulus 
of continuity over the convex hull of J-, not over T itself. In the case that 
u(e, C.Hull^)) 3> u;(£, .F), any confidence intervals centered at affine esti¬ 
mators will perform poorly. Such is the case in the near black object example 
given in the next section. 

5.2. Nearly black object. In this section an example is given which shows 
that the factor z a / 2 k \f\ogk in the upper bound of the minimax expected 
length given in Theorem 4 cannot in general be dropped. It is also shown 
that confidence intervals centered at affine estimators are far from optimal. 

Consider the Gaussian sequence model (2) with the index set M. = {1,2,..., ri}, 
namely 

(72) Y(i) = f(i) + n~ 1 / 2 Zi, i = l,...,n, 

where z % 1 '^‘ IV(0,1). The size of the vector, n, is assumed large. We assume 
that the vector / is sparse: only a small fraction of components are nonzero, 
and the indices or locations of the nonzero components are not known in 
advance. 

Denote the Iq quasi-norm by ||/||o = Card/ 0}). Fix m n . The 
collection of vectors with at most m n nonzero entries is 

G = £o(m n ) ={f e M" : H/llo < m n }. 

Assume that m n is known and m n < n 7 where 7 < 

Such an example is considered in Cai and Low (2004) in the context of 
minimax estimation. The model, which arises naturally in wavelet analysis, 
has also been studied in Donoho, Johnstone, Hoch and Stern (1992) and 
Abramovich, Benjamini, Donoho and Johnstone (2000) for estimating the 
whole object. 

Let the linear functional Tf be given by 

n 

77 = £/«, 

i— 1 

and following Cai and Low (2004) let I(m n ,n ) be the class of all subsets of 
{ 1 ,..., n} of m n elements and for I £ T(m n , n) let 

Fi = {f€R n :fti) = 0 
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Note that Tj is an m n -dimensional subspace spanned by the coordinates in 
I. These are obviously convex and Q = U Ti where the union is taken over 
/ in the set X{m n ,n). From now on we shall assume that I is in the set 
I(m n ,ri). 

Simple calculations show that for all I, J £ X(m n ,n) 
w{e,Ti,Tj) = \j Card(I U J)e 

and consequently 

w{e,Ti,Q) =uj(£,g,J r i) =uj(e,g) = \J‘lm n E. 

Remark 9. It is easy to see that C.Hull(^) = M n and hence w(e, C.Hull(^)) 
sjne. It follows from Theorem 5 and (66) that any confidence interval with 
coverage of at least 1 — a centered at an affine estimator must have maximum 
expected length bounded from below by a fixed constant not depending on 

n. 


Let k be the number of the m n -dimensional parameter spaces J-j. Then 
k is equal to n choose m n and it is easy to see that 

k= ( n ) <n mn . 

\m n J 

The following result gives a lower bound on the expected length of any 
confidence interval with a minimum coverage probability of 1 — a over Q. 


Proposition 5. Suppose that we observe the Gaussian sequence model 
(72), that n > 4 and m n < n 7 with 7 < Let Tf = J2i=i /(*) and 0 < a < 
7j. Suppose that CI(Y) is a confidence interval for Tf based on (72) and 
CI(Y) e I a ,g. Then for all sufficiently large n, 


(73) 


E 0 L(CI(Y))> 

> 



where Eo denotes expectation under the Gaussian model (72) with f(i ) = 0 
for i = 1 , 2 ,..., n. 


Remark 10. It follows immediately from (73) that the maximum ex¬ 
pected length of CI(Y ) over Q satisfies 


/ ydog k 



(74) 


L(CI(Y),g)>Cu 
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Comparing the lower bound (74) for the maximum expected length with 
the minimax upper bound given in (65) shows that the factor \/log k in the 
upper bound for the minimax expected length cannot be dropped in general. 
A similar result also holds for adaptation. 


Proof of Proposition 5. In the following proof the calculation of the 
L i distance between a mixture of normals and a given normal distribution 
follows a similar calculation used in Cai and Low (2004). We include the 
details of the calculation here for completeness. In the proof we will omit the 
subscript in m n and simply write m for m n . Let ipfbe the joint density of the 
Gaussian observations given in (72). More specifically i/jf is a multivariate 
normal density with mean (/(l), /(2),..., /(re)) and covariance matrix 
where A n is the re x re identity matrix. Fix a constant p> 0. For I G I(m, re) 
let fi be defined by fi(j) = -(Ll(j G I) and let /o be the sequence defined 
by foil) = 0 for j = 1,2,..., re. Finally let 

i’* = 7W\ J2 ^fi■ 

\m) l£T(m,n) 

Note that a similar mixture prior was used in Baraud (2002) to give lower 
bounds in a nonparametric testing problem. Note that for all //• Tfj = mA= 
and that T/o = 0. Note also that if 

p *t, e c/ ( r)) > i - a 


for all I G 2(m,n) then it follows that 

Are ( m “/= G CI ( Y )) = 7WT 51 ( m 7^ e C/ ( y )) >!-«• 

V \m) I£I(rn,n) v 

Note that 

f '/’* _ 1 y^ f 

J _ C) 2 I'Jfr,*) J 

and simple calculations show that 

f^sAfv ,■ 2 \ 

/ —t—— = exp(jp ), 

J V’/o 

where j is the number of points in the set I n I'. It follows that 

f A^ = Eexp(Jp 2 ), 

J V>/o 
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where J has a hypergeometric distribution, 

(m\ /n-m\ 

P(J = 3)= j 

\m) 

Now note that from Feller [(1968), page 59], 


P(J = j)< 


m 




3 J \ n J 


1- 


m 


n 


m-j 


1 - 


m 


n 


Now suppose that n > 4 and that m < n 1 / 2 . Then 

m\~ m 


< 4 mVn 
n J 


and hence 

P(J = j)< 4 m2 / n 
It now follows that if n > 4 and m < n 7 with 7 < ^, then 


m \ l m 
n 


3/ 

n / 


/ 7 — = J Eexp(Jp 2 ) 
V/o 


< 4™ 2 / n ( 1 — — + — exp(p 2 ) 
\ n n 

< 4» 2 /^ 1 + ^ exp (/ , 2 ) j m 


Now take p = 1 log Then 


< 4" 


1 + 


1 


V’/O V n 1 / 2 

Hence we can bound the Ti distance by 


|1- 




1/2 


< 


1 + 


n 


1/2 


rC \ 1/2 

- lj I 0. 


So for any 0 < e < 1 — 2a there exists n £ such that for all n > n £ , / 1 ?/>* — 

^/ 0 I<£- 

It follows from the fact that Cl has minimum coverage probability of 
1 — a and that the L\ distance between ij>f 0 and is bounded above by e 
that 
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Hence 

P, fo (o £ Cl and m ~^= £ Cl'j > 1 — 2a — e. 

Since Cl is an interval the length of this interval must be at least m^= when 

both 0 and m- 7 = are in the interval. Hence for n > n F , 

\M — 

^ ;o L( C /(r))>( i-2 a - £ )^^i„ g (i). 

Now take e = ^ — a. Then for all sufficiently large n, 



where A; is the number of convex parameter spaces in £/. □ 

Remark 11. It follows immediately from Proposition 5 that 

Hence the factor of z a / 2 fc x Vl°g k for adaptation in the upper bound of The¬ 
orem 3 and the same factor for minimax confidence procedures in Theorem 
4 cannot in general be removed. 
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